The value iteration algorithm is not strongly polynomial for discounted dynamic programming
نویسندگان
چکیده
This note provides a simple example demonstrating that, if exact computations are allowed, the number of iterations required for the value iteration algorithm to find an optimal policy for discounted dynamic programming problems may grow arbitrarily quickly with the size of the problem. In particular, the number of iterations can be exponential in the number of actions. Thus, unlike policy iterations, the value iteration algorithm is not strongly polynomial for discounted dynamic programming. © 2014 Elsevier B.V. All rights reserved.
منابع مشابه
Modified policy iteration algorithms are not strongly polynomial for discounted dynamic programming
This note shows that the number of arithmetic operations required by any member of a broad class of optimistic policy iteration algorithms to solve a deterministic discounted dynamic programming problem with three states and four actions may grow arbitrarily. Therefore any such algorithm is not strongly polynomial. In particular, the modified policy iteration and λ-policy iteration algorithms a...
متن کاملAccelerated decomposition techniques for large discounted Markov decision processes
Many hierarchical techniques to solve large Markov decision processes (MDPs) are based on the partition of the state space into strongly connected components (SCCs) that can be classified into some levels. In each level, smaller problems named restricted MDPs are solved, and then these partial solutions are combined to obtain the global solution. In this paper, we first propose a novel algorith...
متن کاملRecent Progress on the Complexity of Solving Markov Decision Processes
The complexity of algorithms for solving Markov Decision Processes (MDPs) with finite state and action spaces has seen renewed interest in recent years. New strongly polynomial bounds have been obtained for some classical algorithms, while others have been shown to have worst case exponential complexity. In addition, new strongly polynomial algorithms have been developed. We survey these result...
متن کاملAn approximation algorithm and FPTAS for Tardy/Lost minimization with common due dates on a single machine
This paper addresses the Tardy/Lost penalty minimization with common due dates on a single machine. According to this performance measure, if the tardiness of a job exceeds a predefined value, the job will be lost and penalized by a fixed value. Initially, we present a 2-approximation algorithm and examine its worst case ratio bound. Then, a pseudo-polynomial dynamic programming algorithm is de...
متن کاملGlobal optimization of mixed-integer polynomial programming problems: A new method based on Grobner Bases theory
Mixed-integer polynomial programming (MIPP) problems are one class of mixed-integer nonlinear programming (MINLP) problems where objective function and constraints are restricted to the polynomial functions. Although the MINLP problem is NP-hard, in special cases such as MIPP problems, an efficient algorithm can be extended to solve it. In this research, we propose an algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Oper. Res. Lett.
دوره 42 شماره
صفحات -
تاریخ انتشار 2014